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(57) ABSTRACT 

The present invention discloses a switching apparatus and 
method using bandwidth decomposition, appling a von 
Neumann algorithm, a Birkhoff theorem, a Packetized Gen- 
eralized Processor Sharing algorithm, a water filling algo- 
rithm and a dynamically calculating rate algorithm in 
packet switching of a high speed network. It is not necessary 
to speed up internally and determine a maximal matching 
between input ports and output ports for the switching 
apparatus and method using bandwidth decomposition 
according to the present invention, so the executing speed of 
a network using the present appatatus and method will be 
increased, and the manufacturing of the present invention 
can be easily implemented by current VLSI technology. 

17 Claims, 3 Drawing Sheets 
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SWITCHING APPARATUS AND METHOD SUMMARY OF THE INVENTION 

USING BANDWIDTH DECOMPOSITION A , f iU , t . . , 

Accordingly, an objective of the present invention is to 

resolve the drawbacks of needing internal speedup and low 
BACKGROUND OF THE INVENTION 5 utilization of the crossbar switch as in the prior art. In order 

, , to accomplish the object, the present invention proposes a 

1. Field of the Invention switching apparatus applied in packet switching of a net- 
Hie present invention relates to a packet switching appa- wor k system using bandwidth decomposition. The present 

ratus and method applied in a network system, and partial- invention also proposes a scheduling algorithm applied in an 

larly to a switching apparatus and method with rate guar- input-buffered crossbar switch. The present invention has 

antees and without internal speedup, using bandwidth 0 the following characteristics: 

decomposition. (1) j t ^ not necessary to speed up inside the present 

2. Description of the Related Art switching apparatus using bandwidth decomposition; 
FIG. 1 is a schematic diagram of a well-known 4x4 (2) it is not necessary to determine a maximal matching 

input-buffered crossbar switch, wherein one end of the 15 between input packets and output ports within every time 

crossbar switch 11 includes a plurality of input ports, and slot; 

each input port includes an input buffer 12. The input buffers (3) it is not necessary to define a frame length; 

12 are used to store packets entering the input ports and (4) the switching apparatus using bandwidth decomposition 

prevent losing the packets due to business of the crossbar according to the present invention can reach 100% utili- 

switch 11. Another end of the crossbar switch 11 is con- 2Q zation of output rate; 

nected to a plurality of output ports. There is a controller (not (5) the present switching apparatus using bandwidth decom- 

shown) at the intersection of each column and each row of position can afford quality of service (QoS) in network 

the crosscar switch 11 to control the direction of data flow. transmission, such as packet delay, queue length of input 

As shown in FIG. 1, for example, a connecting point 13 buffers, etc; 

represents a corresponding controller at the on position, and 25 (6) the present switching apparatus using bandwidth decom- 

the first input port is connected to the fourth output port, the position affords different service qualities for clients with 

second input port is connected to the second output port, the different service grades; 

third input port is connected to the first output port, the (7) In practical application, the present switching apparatus 
fourth input port is connected to the third output port. If logic using bandwidth decomposition can be implemented by 
one represents the on connection and logic zero represents 30 hardware circuit, especially being formed by a single chip 
the off connection, a permutation matrix can be derived to and embedding the chip on the motherboard of a switch- 
represent the above connection pattern. If the cycle time in ing machine, such as Hub, Switch, etc. 
which a fixed number of packets are transfered by the The present invention proposes a switching apparatus 
crossbar switch 11 is divided into a plurality of time slots a PP ue d in packet switching of a network system using 
with a minimum of one package transferred between any 35 bandwidth decomposition. An element r ( ^- in the rate matrix 
input port and any output port only occurring in one time R=(r /(/ ) represents the input rate assigned to the traffic from 
slot, then synchronization of package transference will be the i-th input port to the j-th output port of an input-buffered 
derived. It is a key point to find out what the connection NxN crossbar switch. The apparatus aspect of the present 
patterns of the crossbar switch 11 are in each time slot. invention mainly comprises a rate-measuring mechanism, a 

Prior art uses internal speed up inside the crossbar switch 40 plurality of input ports, a crossbar switch and a processing 
11 to reach 100% throughput. In other words, the speed of mechanism. The rate-measuring mechanism is used to 
packet switching should be faster than the speed of packet dynamically measure the input rate of the present switching 
transference, and the ratio of that is about 2 times or even apparatus. The plurality of input ports, connected to said 
more. Besides, the maximal matching between input packets rate-measuring mechanism, include a plurality of storing 
and output port of the crossbar switch 11 should be deter- 45 devices for storing input packets. The crossbar switch, 
mined within every time slot to output the greatest number connected to said plurality of input ports, is used to transfer 
of packets within every time slot. As described above, said plurality of input packets to the plurality of output ports 
because a maximal matching algorithm is executed within of said switching apparatus using bandwidth decomposition, 
every time slot, the speed of the crossbar switch 11 can not Tne processing mechanism, connected to said rate- 
be increased to fit the application to current high speed 50 measur ing mechanism, is used to transform said rate matrix 
networks. mt0 connection patterns of said crossbar switch in each time 

Another kind of crossbar switch without internal speedup slot of the c y cle time * 
is disclosed by A. Hung, G. Kesidis and N. Mckeown, "ATM ^ P resent invention regarding method mainly com- 
input-buffered switches with guaranteed-rate property," pnses the foUowing steps: the step of using a von Neumann 
Proc. IEEE ISCC98, Athens, pp. 331-335, 1998, which 55 al g° rith m to transform the rate matrix R= of a NxN input- 
uses a weighted round robin algorithm to derive rate guar- buffered crossbar switch to a doubly stochastic matrix R; the 
antees and 100% output utilization. The above-mentioned ste P of using a Birkhoff theorem to decompose said doubly 
crossbar switch must define a frame length beforehand, and stochastic matrix into a linear combination of a plurality of 
packs constant number of input packets inside the frame. permutation matrices, all said plurality of permutation matri- 
When the frame size is too large, the packet delay will be 60 ces corresponding to a connection pattern of said crossbar 
increased and a lot of memory will be needed to store all the switch; and the step of using a Packetized Generalized 
connection patterns during the cycle time of the crossbar Processor Sharing algorithm to set up a connection pattern 
switch. When the frame size is too small, the utilization of of said crossbar switch in each time slot of the cycle time, 
the bandwidth will be decreased. 

As mentioned above, the crossbar switch applied in 65 BRIEF DESCRIPTION OF THE DRAWINGS 

current network transmission does not completely satisfy the The invention will be described according to the 

needs of the market. appended drawings in which: 
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FIG. 1 is a schematic diagram of a well-known 4x4 sum of all the elements in the same row as the specific 

input-buffered crossbar switch; element and the sum of all the elements in the same column 

FIG. 2 is a schematic diagram of the switching apparatus ^ the s P ecific element are less than one. 
using bandwidth decomposition according to a preferred Ut * ml - maa &Ji»2,Jm jl wherein e is the value of 

embodiment of the present invention; 5 subtracting one from the larger one between the i-th row sum 

„. „ , . . A and the j-th column sum of the rate matrix. Adding the value 

FIG. 3 is a structure diagram of the control unit m FIG. 2 e to the element having index (i> j} {n the rate matrix to 

according to a preferred embodiment of the present inven- generate a new matrix R v Then in R lf the number of row 

Uon » sums and column sums that are strictly smaller than one is 

FIG. 4 is a structure diagram of the selecting mechanism 1Q at least one less than that in the rate matrix R. 

in FIG. 3 according to a preferred embodiment of the present (a3) Repeat step (al) and step (a2) until a doubly sto- 

invention; chastic matrix R is obtained. 

FIG. 5 is a timing diagram according to the present After finding out a doubly stochastic matrix R, a Birkhoff 

invention; and theorem as follows is used to decompose the doubly sto- 

FIG. 6 is a flow diagram of a water filling procedure for 15 chastic matrix R into a linear combination of a plurality of 

the switching apparatus using bandwidth decomposition permutation matrices. The sum of coefficients in the linear 

according to the present invention. combination is one, and every permutation matrix is corre- 
sponding to a connection pattern of the crossbar switch 11. 

PREFERRED EMBODIMENT OF THE PRESENT The Birkhoff theorem can be seen in "Inequalities: Theory 

INVENTION of Majorization and Its Applications," by Albert W. Marshall 

For convenience, let r y represent an input rate from the ACADEMIC PRESS > 1979 - 

For a doubly st< 
positive value $ k and a set of permutation matrix P* such that 



i-th input port to the j-th output port in a NxN input-buffered ^ , _ L . . 
crossbar switch, and have the following relationship: F ? T a d ° ubl y stochastic matrix R, there exists a set of 



* (1) 25 

Za r U ^ 1> for every ; 

' =1 k > 

» (2) 
> nj £ 1, for every i 

jz{ Let e be a column vector with all elements being one. As R 

30 is doubly stochastic, an inference thak 

Inequality (1) and inequality (2) are called "no overbook- f \ 

ing conditions", and mean that neither the total rate to an e = ~ Re = = Z/ 4 r 

output port nor the total rate coming out from an input port * 1 

can be larger than one. 35 

Let matrix R«=( r />) represent a rate matrix. If the matrix can be obtained and shows that 
R= ( r »^) satisfies inequality (1) and inequality (2), the matrix 
R is called "doubly substochastic matrix". If the matrix V^ = 1> 

R«(r lV ) satisfies the equal conditions of inequality (1) and ^ t 
inequality (2), the matrix R is called "doubly stochastic 
matrix". 

TP . , , , . t . „ 4 . , _ .. . Algorithm 2: Deduced from Birkhoff Theorem 

If a demanded-rate matrix satisfies the definition of a (bl) Find out a set of columQ ^ces {{ { ^ {mm 

doubly substochastic matnx, all input packets will be sent to me permutations of (1,2,3, . . . ,N) for a doubly stochastic 

rfo%Trr± n Tn tP ^ 45 matrix R, such that all the corresponding elements?,, ofthe 

no extra latency. A doubly stochastic matrix is regarded as doubl stochastic matfix m ^ ^ whe remVl,2, 
the supplied-rate matnx of the crossbar switch 11 relative to 

the demanded-rate matrix. Every element of the doubly V*** „ * * n u i i.^xr. 

* tn ~u***: * * i »u *». i * * *u ( b2 ) Define a matnx R ± whose value is equal to R-cbiP,, 

stochastic matrix is not less than the element at the same „,u • n ■ *u * * • j- . 7- 1 - 1 

:~a PfU j u u * u . • 1 i " wherein Pj is the permutation matnx correspondmg to (1,4,, 

mdex of the doubly substochastic matrix, and that assures c n • \ x „;„ r ; 1 w„- *u u * 1 

the rate supply of the crossbar switch 11 is not less than the 5 ° : ' * 2 N * * 

rate demand, and can supply a rate guarantee to satisfy the r *^ff ."!'"''' 

demand of all input ends. ( b3 ) rf *i 15 e< l ual t0 one m6 Rie-Rc-Pie-0, wherein 0 

til • , f - 11 1 xt represents a column vector whose all elements are zero, then 

The present invention uses the well-known von Neumann , . 0 ■ , # . , ... * . 

f i onror l . o i ^ - f , _ f » , . , . , . . . . matrix Rj is a zero matnx and the decomposition operation 

theorem and algonthm to find out a doubly stochastic matnx 55 ^ ended . 

from a doubly substochastic matrix. The von Neumann , UA \ -V . . lm tU , , . 

.1 ^ if . UT .... ™ - . , . . (b4) if ^ is less than one, then generate a doubly 

theorem can be seen in "Inequalities: Theory of Majonza- stochastic matrix 
tion and Its Applications," by Albert W. Marshall and 
Ingram Olkin, ACADEMIC PRESS, 1979. 
von Neumann Theorem 

If a matrix R=(r iV ) is doubly substochastic, then there 
exists a doubly stochastic matrix R=( r *v) sucn mat r »v- r «v 

for every i and j. and return to step (bl) to continue the decomposition 

This can be constructed by the following algorithm: operation. 
Algorithm 1: von Neumann Algorithm 65 Besides, for the supplied-rate matrix R of the crossbar 

(al) If the sum of all the elements in the rate matrix R is switch 11, the connection pattern has at most N^-2N+2 

less than N, then there exists a specific element (i, j), and the kinds according to the verification of the Birkhoff theorem. 
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In practical application, the step (b4) of the Algorithm 2 
can be further improved to omit the step of generating a 
doubly stochastic matrix 



An example of a 4x4 crossbar switch is given to illustrate 
the whole process of the crossbar switch 11, and considers 
the rate matrix R as follows: 



directly entering into step (bl) instead. After step (b2) of 
Algorithm 2, the sums of every row and column are left 
l-<|) r Although a doubly stochastic matrix is obtained by 
dividing the matrix by l-^ and the next coefficient has been 
amplified 



10 



0 03 0.2 0.4 

0.2 0.3 0 0.2 

0.4 0.1 0.3 0 

0.2 0 0.2 0.3 



First, the elements in positions (1,2 ) (2,1) (2,2) (3,2) (3,3) 
(4,3) (4,4) are changed according to Algorithm 1 and obtain 
a doubly stochastic matrix R as follows: 



times, the coefficients after decomposition shall be multi- 
plied by l-^j to derive the real coefficients, and the con- 
clusion is the same with directly entering into step (bl). 

After obtaining the linear combination of the permutation 
matrices with the supply rate (or connection patterns) of the 
crossbar switch, how to set up thee connection patterns in 
one time slot of the cycle time T of the crossbar switch 11 
and how to control the packet delay and queue length are 
then determined. To reach the purpose, the present invention 
uses a Packetized Generalized Processor Snaring algorithm, 
also called PGPS for the timing scheduling of the crossbar 
„ switch 11. The Packetized Generalized Processor Sharing 
algorithm can be seen in A. K. Parekh and R. G. Gallager, 
"A Generalized Processor sharing approach to flow control 
in integrated service networks: the single-node case," IEEE/ 
ACM Transactions on Networking, Vol. 1, pp.344-357, 
1993. 

Algorithm 3: Packetized Generalized Processor Sharing 
Algorithm (PGPS) 

(cl) Assume that the Algorithm 2 finds out K types of 
permutations, and giving each permutation a token; 

(c2) In the first time slot of the cycle time, each of the K 
permutations generates the first token, and derives a virtual 
finishing time of the first token of the i-th permutation as 



15 



pi - 1 



0 0.4 0,2 0.4 

0.4 0.4 0 0,2 

0.4 0.2 0.4 0 

0.2 0 0.4 0.4 



20 



Secondly, the matrix R is decomposed by Algorithm 2 
into a linear combination of a plurality of permutation 
matrices, R=P 1 x(|) 1 +P 2 x(|) 2 +P 3 x<t> 3 +. . . 
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30 



wherein fa-faQA, <t> 3 =0.2 and 



35 
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In the first time slot of the cycle time T, the first tokens of 
40 the three permutation matrices P 2 , P 2 and P 3 will be gener- 
ated by the crossbar switch 11, and their corresponding 
virtual finishing times are 



45 



wherein ty k is the corresponding coefficient of linear com- 
bination of the plurality of permutation matrices, and sort 
these K tokens in an increasing order of the virtual finishing 
time. 

(c3) A permutation matrix with the smallest virtual fin- 
ishing time has the right to be set up as the connection 
pattern of the crossbar switch in the time slot; and 

(c4) The k-th token in the 1-th time slot is generated by the 
crossbar switch after the corresponding connection pattern 
of the k-th permutation matrix in the (I-l)-th time slot is set 
up. The virtual finishing time of the k-th token of the 1-th 
time slot is as 



and the virtual finishing time of other K-l tokens remains 
their old values. The virtual finishing time of the k-th token 
of the 1-th time slot is inserted to the sorted token list and 
repeats from step (c3). 



55 



65 



2.5, F\ 



respectively. The sorting result of the above virtual finishing 
time is F 1 1 «F 2 1 <F 3 1 . After that, the connection pattern of the 
crossbar switch 11 is set up according to the permutation 
matrix Pj, and then the virtual finishing time of the token of 
the permutation matrix Pj is modified to 



60 



The virtual finishing times of the tokens of the permutation 
matrices P 2 and P 3 are not changed, and still are F 2 J -2.5 
F 3 1 -5. Depending on the rules, the virtual finishing times of 
the tokens of the three permutation matrices are sorted and 
the sorting result is F 2 1 <F 3 1 «F 1 2 . In the second time slot, the 
connection pattern of the crossbar switch 11 is set up 
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according to the permutation matrix the virtual finishing FIG. 2 is a schematic diagram of the switching apparatus 
time is modified to using bandwidth decomposition according to a preferred 

embodiment of the present invention. The present invention 
f2 _ F i + J_ _ 5 comprises a rate-measuring mechanism 21, a first input port 

2 " 2 fa ~ 5 22 to N-th input port 23, a processing mechanism 26 and a 

crossbar switch 11. The rate-measuring mechanism 21 is 
andthevirmalrinishingtimesofthepermutationmatri^ used to measure the input flow and based on equation (3) 

and P 3 are not changed, still being F^-S and F^5. and <£> Xo u co f m P le fl te the ste P s of £ e ^^fr calculating 
Depending on the rules, the virtual finishing time of the rate * Each of the firs j m P ut P ort 22 t0 the N ' th m P ut P ort 23 
tokens of the three oermutation matrices are sorted and the 10 contains a queue and connects to the rate-measunng mecha- 
sorting result is f/-F 2 2 -F 3 \ In the third, fourth and fifth nism 21 for buffering input packets. The crossbar switch 11, 
time slot, the connection patterns of the crossbar switch 11 connected to the plurality of input ports 22, 23, is used to 
are set up according to the permutation matrices P lf P 2 and transfer the plurality of input packets to the plurality of 
P 3 . When the fifth time slot is finished, the virtual finishing output ports. The processing mechanism 26, connected to 
times of the tokens of the three permutation matrices are 35 the rate-measuring mechanism 21, is used to generate the 
F 1 3 =F 2 3 «7.5 and F 3 2 =10. The three virtual finishing times only connection pattern for the crossbar switch 11 in one 
are sorted, and the sorting result is F J 3 =F 2 3 <F 3 2 . The con- time slot according to Algorithms 1 to 3. The processing 
nection patterns of the crossbar switch 11 in each time slot mechanism 26 includes a processing unit 24 and a control 
are determined sequentially based on the Algorithm 3. unit 25. The processing unit 24 generates permutation matri- 
According to the above examples, the ratios ^ife:^ of the 2 o ces Pj to P* and the corresponding coefficients of linear 
three kinds of permutations appear as 4:4:2=2:2:1. combination ^ to <fr k according to Algorithms 1 to 2. The 

If the demand of the traffic flow is known in advance, control ^ 2 $ receives the permutation matrices P, to P* 
Algorithm 1 and Algorithm 2 are only computed once and and the corresponding coefficients of linear combination ^ 
determine the connection pattern of the crossbar switch and tQ + fn)m the rocessi mh 24 ^ accord ia g to Algo- 
then on-line computing by ^the Algorithm 3. On the condiUon 25 rithm 3 to te the onl tation matrix Pin OQe ^ 

of the demand flow invariable, Algorithm 1 and Algorithm slot ^ processing mechanism 26 can be implemented by 
2 are not necessary to recompute. But sometimes the software 0f hardware) and because the algorithm of the 
demand flow is changed after a period of time or the input t invention ^ very regular and symm etric, no matter 

flow has a burst behavior. In other words, the input flow ^ implementation h hardware or ^ very easy and 

enters the crossbar switch 11 densely for a period of time. 30 flexible ^ permutation matrix p 0 utputted from the pro- 
Under this circumstance, if the connection patterns of the cessi mechanism 2 6 is used to control the controller on 
crossbar switch are determined by average traffic flow and each intersection of the rows and of me crossbar 

through Algorithm 1 to 3, the queue length of the input switch u (not shown) If one element of the permutation 
buffers wJl be increased rapidly dunng a short tune. A matrix p fa ^ onej that ents ^ { t kel that can 
means of dynamically calculating rate is used to resolve the 35 reach ^ con^^g output port. If one element of the 
problem which calculates the flow variance of the crossbar permu tation matrix P is logic zero, that represents an input 
switch dunng one cycle time for generating a new flow ket ^ ^ nQt feach ^ corresponding output port, 

demand, and then determines the connection patterns of FIG. 3 is a structure diagram of the control unit in FIG. 2 
each time slot in the cycle time by Algorithms 1 to 3 according to a preferred embodiment of the present inven- 

A possible ^ way to implement the dynamically calculating ^ tion ^ stmcture mmpxises a pluralit y of registers 31> a 
rate is as tollows: selecting mechanism 33 and a multiplexer 32. The plurality 

of registers 31 are used to store the plurality of permutation 
nj(n + 2) = (l - a(n))nj(n + 1) + ^n)( Ai ' j{nT) ~ A ^ jiin ~ m ) (3) matrices Pj to generated from the processing unit 24. A 

control signal S as the selecting signal of the multiplexer 32 
l (4) 45 for selecting the only permutation matrix between P 2 to P* 

r i,A - r :A m one tjjne s j ol £ s generated by inputting the coefficients c^^ 

to $ k of the linear combination to the selecting mechanism 

33 

wherein 0<ct(n)<l, n^l and n represents times of the ™^ A . Ct < , t . , . 

* • 11 i i * ,rw j /*\ • i FIG. 4 is a structure diagram of the selecting mechanism 

dynamically calculating rate; r,,<0) and r./l) are initial en . ™~ - 4 % A u 

values of the input rate of the crowbar switch; T is the cycle 50 m FI °: 3 a £° rd , mg 0 a preferred emb ° du ?. ent "fj^rT 

time calculating input rate of the crossbar switch 11; A,, "T ?? ^ TT" aplural , ,ty °f t 

/ a // i\tv ■ *u t . u c *u • *u • * a plurality of selecting registers 42, a register file 45, a sorter 

( ^TV^P^." rf!l nUm ^ \ t li 43 1 and an adder 44. The plurality of didders 41 an! used to 
port to the j -th output port of the crossbar switch 11 dunng t . , r. «- ■ „ . 4 A i 

/- i\-t- » *• nr. j generate the reciprocal of the coefficients A, to A. as virtual 

time (n-l)T to tune nT, and „ % . «. il . , 4 . ^ - 4 , Y1 Y * . . 

v ' 55 finishing times. Durmg cycle time T of the crossbar switch 

11, every virtual finishing time is stored in the register file 

A u (nT)-A u ((n-l)T) 45 j Q the ^ Qf cyde ^ {h& virtua j finishing 

T times outputted from the plurality of dividers 41 are stored 

in the plurality of selecting registers 42. The sorter 43, 

is the input traffic rate of the crossbar switch 11 during time 60 connected to the plurality of selecting registers 42, selects 

(n-l)T to time nT; a(n) is a parameter adjusting effect of the smallest virtual finishing time and outputs the series 

input rate, and if the variance of the traffic flow of the number S of the selecting register containing the smallest 

crossbar switch 11 is large, a(n) should be amplified to virtual finishing time. The adder 44 is used to add the 

adjust the rate being estimated, and if the input rate is smallest virtual finishing time selected by the sorter 43 and 

smooth, then ct(n) should be scaled down; if a(n)-n/l, the 65 the virtual finishing time stored in the register with a series 

input rate estimated will be the sample mean of the real input number S of the register file 45, and feeds the result to the 

rate. corresponding selecting register 42 with series number S. 
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FIG. 5 is a timing diagram according to the present rate service, and leaves the residual bandwidth to the clients 

invention, wherein the steps of the present invention can be with best-effort service. In other words, the present appara- 

divided into a measuring step, calculating step and sched- tus first sets up the traffic flow of the clients with the highest 

uling step. In the measuring step, the input flow during time priority to let the clients gain their demand bandwidth, and 

(n-l)T to time nT is measured by equation (3) and (4), and 5 after that, a method like water filling is used to allocate the 

a demanded-rate matrix R is obtained. In the calculating residual bandwidth to others. 

step, a permutation ? k and weight $ k are obtained by FIG. 6 is a flow diagram of a water filling procedure for 

Algorithms 1 and 2, and cycle time T including a plurality me switching apparatus using bandwidth decomposition 

of time slots, must be long enough to execute Algorithm 1 according to the present invention. In step 61, the initial 

and 2 by the present apparatus. In the scheduling step, the 10 elements m a matrix R are up to the rate matrix with 

connection pattern of the crossbar switch 11 in each time slot guaranteed-rate service. In step 62, the elements unneces- 

is determined by Algorithm 3 for on-line scheduling proce- ^ t0 J° m bandwidth allocation in the matrix are marked 

t j ure In step 63, whether there are any elements to join bandwidth 

a~ u~ 4U u ♦ • *• c *u * allocation is determined. If the answer is no, the procedure 

As mentioned above, the characteristic of the present . . r - , 4 , , , . Jtl _ „ \ * , , 

.„ . , ... \ 4 , - „ r enters step 65 and the bandwidth allocation is ended. If the 

invention will is be dlustrated as follows: 15 answer J sl 63 ^ the procedure enters st ^ and 

1. If the no ^overbooking conditions ^of inequahty (1) and (2) adds the elements havin the right tQ join 5andw f dth ^ 
are satisfied, an equation C lV <t)-C lV {s)^r tV <t-s)-N 2 +2N- cation by a oonstanl until one or more elements are not 
2, is guaranteed by Algorithms 1 to 3 as scheduling necessary to join bandwidth allocation again. Generally 
policies, wherem C lV <t)-C lV <s) is the cumulative number speaking, the value of every element in the matrix R is 
of time slots that are assigned to the traffic from the i-th 2 o increased slowly until overflowing. The elements having 
input port to the j-th output port during time t to time s, overflowed in the matrix are not allocated any bandwidth 
r u is the rate from the i-th input port to the j-th output port; again, and the other elements continuously increase in 
N is the number of input ports of the crossbar switch 11. bandwidth allocation procedure until all elements in the 

2. It is not necessary to speed up inside the crossbar switch matrix stop bandwidth allocation. An element in the matrix 
11, and all packets switched are completed during one 2 5 having overflowed means that the sum of the column at 
^ mc slot- which the element is situated is one, or the elements in the 

3. If the no overbooking conditions of inequality (1) and (2) ma trix are satisfied with the rate demand of both service 
are satisfied by the traffic flow through the crossbar switch grades. It is unnecessary to consider the row sum constrain, 
11, the present invention will propose a supplied-rate because every input port has at most one input packet in each 
matrix R being not less than the demanded-rate matrix R 30 time slot, and the row sum will not violate the no overbook- 
to fit the demand of traffic flow. Therefore, the present ing conditions of inequality (2). Therefore, whether the 
invention is a "uniformly good" method, and can reach column sum violates inequality (1) is only considered. After 
100% output rate. finishing the water filling procedure, Algorithms 1 to 3 are 

4. It is not necessary to determine the maximal matching proceeded for rate guarantees and rate fairness. By the two 
between input ports and output ports in each time slot 35 services mentioned above, the output flow of the crossbar 
according to Algorithms 1 to 3. If the demanded-rate switch 11 will reach maximum under the condition of 
matrix R does not change in the cycle time of measuring guaranteed-rate service. 

traffic flow, Algorithms 1 and 2 are not necessary to A connection pattern of the crossbar switch 11 will be set 

recompute and only Algorithm 3 is necessary to on-line up in each time slot. There is a constrain that when packets 

compute. 40 stored in different input buffers but destinated to the same 

5. The algorithms of the present invention are not complex output ports, only one packet can not be transmitted in one 
in computing, and most contain basic matrix operations. time slot. The constraint will create low throughput caused 
By the VLSI technology nowadays, the present invention by head of line blocking, also called HOL blocking. The 
can be implemented easily and widespreadly used in the cause of HOL blocking is the FIFO (single First In First Out) 
business. 45 structure of input buffers. In other words, the packets stored 
In other applications, the present invention can supply in the input buffers are sequentially transmitted according to 

different service grades. For example, the service grades can the storing time, and the latter packets must stay in the input 

be classified into guaranteed-rate service and best-effort buffer, even when the latter packets are destinated to differ- 

service. In guaranteed-rate service, the clients first request ent output ports from the prior packets. The situation will 

their necessity, and then the crossbar switch 11 must satisfy 50 largely reduce the utilization of the crossbar switch 11. The 

the request from the clients and support the rate guarantees. present invention uses the method of virtual output queuing, 

In best-effort service, the crossbar switch 11 first supports also called VOQ, to resolve the above questions, that every 

the rate guarantees, and allocates the residual bandwidth to input buffer is divided into 2N virtual output queues imple- 

clients. Apparently, the clients with guaranteed-rate service mented by a memory means. The traffic flows with different 

have higher priority than client s with best-effort service. 55 service grades are stored in different virtual output queues 

First, the input rate through the crossbar switch 11 is respectively, depending on the output ports the packets 

measured by the dynamically calculating rate of equations output to but not according to the output ports only, wherein 

(3) and (4). When the input rate satisfies the no overbooking the n-th virtual output queue stores the packets transmitted 

conditions of inequalities (1) and (2), the present invention to the n-th output port (1 ^n^N). When a packet enters one 

can support a service that satisfies the demand of all input 60 input port, the packet is stored in the corresponding virtual 

rates. Secondly, Algorithms 1 to 3 are executed directly, and output queue according to the output port the packet trans- 

the rate guarantees and rate fairness for all clients are mitted to. In other words, the memory address of the virtual 

obtained. But when the no overbooking conditions are not output queue is recorded. The packets outputted can be read 

satisfied with all input rates, the crossbar switch 11 gives a out by polling the memory means, and the disadvantage of 

higher priority to the clients with guaranteed-rate service. In 65 packet blocking described above will not happen again, 

other words, the crossbar switch 11 allocates the element The above-described embodiments of the present inven- 

(represents bandwidth) of R to the clients with guaranteed- tion are intended to be illustrated only. Numerous alternative 
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embodiments may be devised by those skilled in the art 
without departing from the scope of the following claims. 
What is claimed is: 

1. A switching method using bandwidth decomposition, 
applied in packet switching of a network system, comprising 
the following steps: 

(a) using a von Neumann algorithm to transfer a rate 
matrix R a NxN input-buffered crossbar switch to a 
doubly stochastic matrix R, wherein an element r, v of 
said rate matrix represents the rate from the i-th input 
port to the j-th output port of said crossbar switch, both 
i and j are not less than one and not larger than N, and 
N is the number of the input ports of said crossbar 
switch; 

(b) using a Birkhoff theorem to decompose said doubly 
stochastic matrix into a linear combination of a plural- 
ity of permutation matrices, and each one of said 
plurality of permutation matrices corresponding to a 
connection pattern of said crossbar switch; and 

(c) using a Packetized Generalized Processor Sharing 
algorithm to set up a connection pattern of said crossbar 
switch in each time slot of a cycle time, wherein the 
cycle time represents the time needed to transmit a 
fixed number of input packets. 

2. The method of claim 1, further comprising a step for 
dynamically calculating rate variations of said crossbar 
switch before step (a). 

3. The method of claim 2, wherein if the rate outputted 
from an output port or the rate inputted from an input port 
of said crossbar switch are larger than one, then a water- 
filling algorithm is added to allocate the residual bandwidth 
between the step of the dynamically calculating rate varia- 
tions and step (a). 

4. The method of claim 1, wherein step (a) further 
comprises the following steps: 

(al) if the sum of all elements in the rate matrix R is less 
than N, then there exists a specific element, and the sum 
of all elements in the same row as the specific element 
and the sum of all elements in the same column as the 
specific element are less than one; and finding out the 
specific element; 

(a2) defining e=l-max[2 n r (> ,^: m r m J, wherein € is the 
value of subtracting one from the larger one between 
the i-th row sum and the j-th column sum of the rate 
matrix R adding the value e to the element having index 
(i, j) in the rate matrix to generate a new matrix R a ; the 
number of row sums and column sums in Rj that are 
strictly smaller than one is at least one less than that in 
the rate matrix R and 

(a3) repeating step (al) and step (a2) until a doubly 
stochastic matrix R is obtained. 

5. The method of claim 1, wherein step (b) comprises the 
is following steps: 

(bl) Find out a set of column indices (i ly i 2 > • • • »ijv) from 
the permutations of (1,2,3, ... ,N) for a doubly 
stochastic matrix R such that all the corresponding 
elements r^ of the doubly stochastic matrix are larger 
than zero, wherein k«l,2, . . . ,N; 

(b2) defining a permutation R a whose value is equal to 
wherein P ± is the permutation corresponding 
to (i^ . . . ,y, (fri-min^jt^r^J, fa is the smallest 
value among r^, and k=l,2, . . . ,N; 

(b3) if <)> 1 is equal to one and R 1 e=Re«P 1 e=0, wherein 0 
represents a column vector whose elements are zero, 
then matrix R 5 is a zero matrix and the decomposition 
operation is ended; and 



(b4) if 4> 1 is less than one, then generating a doubly 
stochastic matrix 

5 i-*r 

and returning to step (bl) to continue the decomposition 
operation. 

6. The method of claim 1, wherein step (b) comprises the 
following steps: 

10 (bl) Find out a set of column indices (i lt i 2 , • • • 4a) fr° m 
the permutations of (1,2,3» . ■ • »N) for a doubly 
stochastic matrix R such that all the corresponding 
elements of the doubly stochastic matrix are larger 
than zero, wnerein k-1,2, . . . ,N; 

15 (b2) defining a permutation R ± whose value is equal to 
R-^Pj, wherein P 2 is the permutation corresponding 
to (^42, • . . ,W)> ♦i mi Dia*av[r V4 ] > fa is the smallest 
value among r^, and k=l,2, . . . ,N; 

20 (b3) if <(>i is equal to one and RjeoRe-Pje-O, wherein 0 
represents a column vector whose all elements are zero, 
then matrix Rj is a zero matrix and the decomposition 
operation is ended; and 
(b4) if <(>! is less than one, then returning to step (bl) to 

25 continue the decomposition operation. 

7. The method of claim 1, wherein step (c) comprises the 
following steps: 

(cl) assuming that the Birkhoff theorem finds out K types 
of permutations, and giving each permutation a token; 
(c2) in the first time slot of the cycle time, each of the K 
permutations generating the first token, and deriving a 
virtual finishing time of the first token of the i-th 
permutation as 



30 



35 



55 



wherein $ k is the corresponding coefficient of the linear 
combination of the plurality of permutation matrices, and 
sorting the virtual finishing times of these K tokens in an 
40 increasing order; 

(c3) a permutation matrix with the smallest virtual fin- 
ishing time having the right to be set up as the con- 
nection pattern of the crossbar switch in the corre- 
sponding time slot; and 
45 (c4) the k-th token in the 1-th time slot being generated by 
the crossbar switch after the corresponding connection 
pattern of the k-th permutation matrix of the (l-l)-th 
time slot being set up; the virtual finishing time of the 
k-th token of the 1-th time slot being as 

50 



60 



and the virtual finishing time of other K-l tokens remaining 
their old values; the virtual finishing time of the k-th token 
of the 1-th time slot being inserted to the sorted token list and 
repeating from step (c3). 

8. The method of claim 2, wherein the step for said 
dynamically calculating rate is implemented by the follow- 
ing equation: 



65 



, >+2) . (i -■»)^^)^»»][ w -y o, - l)n ) 

r u (0)=r fJ (l)-i 

wherein 0<a(n)<l, and n^l, n represents the times of the 
dynamically calculating rate of said crossbar switch; a(n) 
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represents a parameter adjusting effect of the input rate, and 
the more variable the input rate is, the larger the parameter 
a(n) should be adjusted, and the more smooth the input rate 
is, the smaller the parameter a(n) should be adjusted; r^(0) 
and r (V (l) represent the values of the initial input rates of said s 
crossbar switch; T represents the cycle time; A^(nT)-A^ 
((n-l)T) represents the packet number from the i-th input 
port to the j-th output port during time (n-l)T to time nT. 

9. The method of claim 3, wherein said water filling 
algorithm comprises the following steps: 

(9.1) the initial elements of a matrix being set up as the 
rate matrix with guaranteed-rate service, and said 
guaranteed-rate service representing a fixed rate having 
to be remained from each input port to the correspond- 
ing output port; 

(9.2) marking the elements unnecessary to join the band- 
width allocation in said rate matrix; 

(9.3) determining if there are other elements to join the 
bandwidth allocation; 

(9.4) if the answer in step (9.3) is no, then entering step 2 o 
(9.6); 

(9.5) if the answer in step (9.3) is yes, then adding a 
constant to each element having the right to join 
bandwidth allocation in said rate matrix until one or 
more elements are unnecessary to join the bandwidth 2 s 
allocation any more; then entering step (9.2); and 

(9.6) ending. 

10. The method of claim 1, further comprising steps of 
virtual output queuing as follows before all steps: 

(10.1) dividing each input buffer into kxN virtual output 30 
queues, wherein k represents the number of service 
grades of said crossbar switch; 

(10.2) storing an input packet in a corresponding virtual 
output queue according to the series number of said 
output port; and 35 

(10.3) a packet being read out from the corresponding 
virtual output queue. 

11. The method of claim 10, wherein said plurality of 
virtual output queues are implemented by a memory means; 
each input packet is stored at a specific memory address 40 
corresponding to the series number of the virtual output 
queue, and also each packet is read out from a specific 
memory address corresponding to the series number of the 
virtual output queue. 

12. A switching apparatus using bandwidth 45 
decomposition, applied in packet switching of a network 
system, comprising: 

a rate-measuring mechanism for dynamically measuring 

input rate of said switching apparatus; 
a plurality of input ports, connected to said rate- 50 

measuring mechanism, including a plurality of storing 

devices for storing input packets; 
a crossbar switch, connected to said plurality of input 

ports, used to transmit said input packets to output ports 

of said switching apparatus using bandwidth decom- 55 

position; and 

a processing mechanism, connected to said rate- 
measuring mechanism for transforming a rate matrix 



into connection patterns of said crossbar switch in any 
time slot of a cycle time, wherein the cycle time 
represents the time needed to transmit a fixed number 
of input packets, 

13. The apparatus of claim 12, wherein said processing 
mechanism comprises: 

a processing unit for decomposing said rate matrix into a 
linear combination of a plurality of permutation 
matrices, and each one of said plurality of permutation 
matrices corresponding to a connection pattern of said 
crossbar switch; and 

a control unit for setting the connection patterns of said 
crossbar switch in one time slot. 

14. The apparatus of claim 13, wherein said control unit 
comprises: 

a plurality of registers for storing said plurality of con- 
nection patterns; 

a multiplexer connected to said plurality of registers for 
outputting one of said plurality of connection patterns; 
and 

a selecting mechanism for generating control signals of 
said multiplexer based on a Packetized Generalized 
Processor Sharing algorithm. 

15. The apparatus of claim 14, wherein said selecting 
mechanism comprises: 

a plurality of dividers for generating reciprocals of coef- 
ficients of the linear combination of said permutation 
matrices as virtual finishing times corresponding to said 
plurality of permutation matrices; 

a plurality of selecting registers, connected to said plu- 
rality of dividers and an adder for storing the content of 
said plurality of dividers in the first time slot of one 
cycle time and storing the content of said adder in other 
time slots of the cycle time; 

a register file, connected to said plurality of dividers for 
storing the virtual finishing time generated by said 
dividers; 

a sorter, connected to said plurality of selecting registers 
for generating the smallest virtual finishing time stored 
in the plurality of selecting registers and the series 
number of said selecting register storing the smallest 
virtual finishing time; and 

an adder, connected to said sorter and said register file for 
updating the virtual finishing time stored in one select- 
ing register which owns the smallest virtual finishing 
time in a time slot. 

16. The apparatus of claim 12, wherein said plurality of 
storing devices of said input ports can be implemented by a 
memory means. 

17. The apparatus of claim 16, wherein said plurality of 
storing devices can be divided into kxN virtual output 
queues, wherein k represents the number of service grades 
of said crossbar switch, N represents the number of input 
ports of said switching apparatus; each input packet is stored 
in one of the virtual output queues of said switching appa- 
ratus corresponding to the series number of the output port. 

***** 
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